DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices
نویسندگان
چکیده
Deploying deep neural networks on mobile devices is a challenging task. Current model compression methods such as matrix decomposition effectively reduce the deployed model size, but still cannot satisfy real-time processing requirement. This paper first discovers that the major obstacle is the excessive execution time of non-tensor layers such as pooling and normalization without tensor-like trainable parameters. This motivates us to design a novel acceleration framework: DeepRebirth through “slimming” existing consecutive and parallel non-tensor and tensor layers. The layer slimming is executed at different substructures: (a) streamline slimming by merging the consecutive non-tensor and tensor layer vertically; (b) branch slimming by merging non-tensor and tensor branches horizontally. The proposed optimization operations significantly accelerate the model execution and also greatly reduce the run-time memory cost since the slimmed model architecture contains less hidden layers. To maximally avoid accuracy loss, the parameters in new generated layers are learned with layer-wise fine-tuning based on both theoretical analysis and empirical verification. As observed in the experiment, DeepRebirth achieves more than 3x speed-up and 2.5x run-time memory saving on GoogLeNet with only 0.4% drop of top-5 accuracy on ImageNet. Furthermore, by combining with other model compression techniques, DeepRebirth offers an average of 65ms inference time on the CPU of Samsung Galaxy S6 with 86.5% top-5 accuracy, 14% faster than SqueezeNet which only has a top-5 accuracy of 80.5%.
منابع مشابه
Deeprebirth: a General Approach for Accel- Erating Deep Neural Network Execution on Mobile Devices
Deploying deep neural networks on mobile devices is a challenging task due to computation complexity and memory intensity. Existing works solve this problem by reducing model size using weight compression methods based on dimension reduction (i.e., SVD, Tucker decomposition and Quantization). However, the execution speed of these compressed models are still far below the real-time processing re...
متن کاملAccelerating Convolutional Neural Networks for Continuous Mobile Vision via Cache Reuse
Convolutional Neural Network (CNN) is the state-ofthe-art algorithm of many mobile vision fields. It is also applied in many vision tasks such as face detection and augmented reality on mobile devices. Though benefited from the high accuracy achieved via deep CNN models, nowadays commercial mobile devices are often short in processing capacity and battery to continuously carry out such CNN-driv...
متن کاملMCDNN: An Execution Framework for Deep Neural Networks on Resource-Constrained Devices
Deep Neural Networks (DNNs) have become the computational tool of choice for many applications relevant to mobile devices. However, given their high memory and computational demands, running them on mobile devices has required expert optimization or custom hardware. We present a framework that, given an arbitrary DNN, compiles it down to a resource-efficient variant at modest loss in accuracy. ...
متن کاملDXTK : Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit
Deep learning is having a transformative effect on how sensor data are processed and interpreted. As a result, it is becoming increasingly feasible to build sensor-based computational models that are much more robust to real-world noise and complexity than previously possible. It is paramount that these innovations reach mobile and embedded devices that often rely on understanding and reacting ...
متن کاملEnergy and Performance Efficient Computation Offloading for Deep Neural Networks in a Mobile Cloud Computing Environment
In today’s computing technology scene, mobile devices are considered to be computationally weak, while large cloud servers are capable of handling expensive workloads, therefore, intensive computing tasks are typically offloaded to the cloud. Recent advances in learning techniques have enabled Deep Neural Networks (DNNs) to be deployed in a wide range of applications. Commercial speech based in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1708.04728 شماره
صفحات -
تاریخ انتشار 2017